[RORDEV-1944] investigating socket hangs up issue by Dzuming · Pull Request #82 · beshu-tech/readonlyrest-e2e-tests

Dzuming · 2026-04-30T19:01:51Z

Summary by CodeRabbit

Tests
- Improved E2E test synchronization for greater stability.
- Added Elasticsearch connectivity monitoring during E2E runs to surface cluster and environment connectivity issues.
Chores
- Integrated the connectivity monitor into automated test workflows to ensure it starts/stops with test jobs for better visibility and troubleshooting.

coderabbitai · 2026-04-30T19:02:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a Cypress synchronization helper and integrates a new GitHub composite action that can start/stop a background Elasticsearch connectivity monitor (host, Docker container, Kubernetes pod) into E2E workflow jobs.

Changes

Tenancy Test Synchronization

Layer / File(s)	Summary
Page Object Helper `e2e-tests/cypress/support/page-objects/IndexManagement.ts`	Adds `static waitingForSectionLoadingFinish()` which logs "Waiting for section loading" and asserts `sectionLoading` test subject does not exist within a 30s timeout.
Test Integration `e2e-tests/cypress/e2e/Tenancy.cy.ts`	Imports `IndexManagement` and calls `IndexManagement.waitingForSectionLoadingFinish()` after login initialization and before executing the post-login callback to synchronize the test.

ES Connectivity Monitoring Infrastructure

Layer / File(s)	Summary
Monitoring Action `.github/es-connectivity-monitor/action.yml`	New composite GitHub Action with `action` input (`start`/`stop`) plus `es_url`, `es_user`, `es_password` inputs. `start` spawns a detached loop (every 5s) that logs host-side `/_cluster/health` via `curl`, performs TCP reachability checks from a detected local `kbn` Docker container and from a discovered Kibana Kubernetes pod (resolving an ES service name), tails recent container logs, writes output to `/tmp/es-connectivity-monitor.log`, and saves the background PID to `/tmp/es-connectivity-monitor.pid`. `stop` attempts to kill the PID and prints the monitor log.
Workflow Integration `.github/workflows/trigger-e2e-tests.yml`	Adds "Start ES connectivity monitor" steps (after Docker memory monitor) to `master-e2e-tests`, `master-bootstrap-tests`, and `develop-e2e-tests` jobs, plus corresponding `if: always()` "Stop ES connectivity monitor" steps to ensure the monitor is stopped.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Runner as "GH Actions runner"
    participant Host as "Host (curl)"
    participant Docker as "Docker kbn container"
    participant KubePod as "Kibana pod (k8s)"
    participant ES as "Elasticsearch"

    Runner->>Host: start es-connectivity-monitor (composite action)
    Host->>Host: spawn background loop (every 5s) -> log to /tmp/es-connectivity-monitor.log
    Host->>ES: curl inputs.es_url/_cluster/health (record status/timing)
    Host->>Docker: exec TCP probe to es-ror:9200 and tail container logs
    Host->>KubePod: kubectl exec TCP probe to resolved ES service:9200
    Note right of Host: loop repeats until stopped
    Runner->>Host: stop es-connectivity-monitor (always)
    Host->>Host: kill PID (best-effort), print /tmp/es-connectivity-monitor.log

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I hopped through CI and test-time light,
Waiting for sections till they readjust right,
I listen to ES, from host, container, and pod,
Logging the pulses that keep tests from odd,
A tiny rabbit watcher — steady and bright.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	⚠️ Warning	The PR title references a ticket (RORDEV-1944) about investigating socket hangs, but the actual changes focus on adding synchronization waits for index management section loading and monitoring Elasticsearch connectivity.	Align the title with the main changes: consider 'Add index management section loading synchronization and ES connectivity monitoring' or similar to accurately reflect the implemented functionality.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-tenancy-switch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e-tests/cypress/support/page-objects/IndexManagement.ts`:
- Around line 113-114: The current spinner waits assume the loader will both
appear and then be hidden, which flaps when loading has already finished or the
element is removed; replace the two strict gets with a resilient check against
the '[data-test-subj="sectionLoading"]' element: attempt to get it with a
timeout, then branch on its existence/visibility so the helper passes if the
loader is already absent, if it appears then waits until it is not
visible/removed, and if it is present but hidden immediately treat that as
success; target the selector '[data-test-subj="sectionLoading"]' when
implementing this logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4dacbc3c-dec7-4a95-943f-11b47c2c9076

📥 Commits

Reviewing files that changed from the base of the PR and between f4c12e5 and 851f4eb.

📒 Files selected for processing (2)

e2e-tests/cypress/e2e/Tenancy.cy.ts
e2e-tests/cypress/support/page-objects/IndexManagement.ts

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

.github/workflows/trigger-e2e-tests.yml (2)
64-68: ES connectivity monitor log is not included in S3 artifact uploads.

The "Stop ES connectivity monitor" step prints the log to stdout only. The subsequent "S3 Upload Videos & show logs" step (on failure) uploads videos and Docker logs, but the ES connectivity log at /tmp/es-connectivity-monitor.log is not explicitly included. If you want to correlate ES connectivity loss with test failures, consider having .github/upload-videos (or a dedicated step) pick up /tmp/es-connectivity-monitor.log.

Also applies to: 125-129, 186-190
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/trigger-e2e-tests.yml around lines 64 - 68, The ES
connectivity monitor's log (/tmp/es-connectivity-monitor.log) is only printed to
stdout in the "Stop ES connectivity monitor" step and isn't included in the
artifacts; update the upload step (the "S3 Upload Videos & show logs" step or
the reusable action .github/upload-videos) to explicitly collect and upload
/tmp/es-connectivity-monitor.log (or add a dedicated step after the "Stop ES
connectivity monitor" step to upload that file to S3/artifacts), ensuring the
file path is added to the list of uploaded logs so ES connectivity traces are
available alongside videos and Docker logs.
39-42: Consider explicitly passing es_url, es_user, and es_password to make configuration explicit rather than relying on action defaults.

The workflow currently relies on the monitor's default values (https://localhost:9200, kibana/kibana credentials), which do match the port mappings in both Docker and ECK test environments. However, explicitly passing these values would improve clarity and reduce fragility if future test infrastructure changes the port or credentials. This is not currently causing issues, but makes the configuration harder to audit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/trigger-e2e-tests.yml around lines 39 - 42, The ES
connectivity monitor step ("Start ES connectivity monitor" using
./.github/es-connectivity-monitor) currently relies on the action's defaults;
update that step to explicitly pass the es_url, es_user, and es_password inputs
(e.g., set with: es_url: "http://localhost:9200", es_user: "kibana",
es_password: "kibana" or the appropriate test creds) so configuration is
explicit and not dependent on action defaults; ensure the input names match the
action's input parameters and keep the existing with: action: start entry.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/es-connectivity-monitor/action.yml:
- Around line 33-35: The curl run step currently interpolates inputs with `${{
inputs.es_user }}`, `${{ inputs.es_password }}` and `${{ inputs.es_url }}`
directly into the shell command (the `run:` block) which is unsafe; update the
step to bind each input to environment variables (e.g., ES_USER, ES_PASSWORD,
ES_URL) via the step's `env:` and then reference them inside the `run:` script
as `$ES_USER`, `$ES_PASSWORD`, and `$ES_URL`; apply the same change to the other
similar curl steps (those using the same `${{ inputs.* }}` patterns on lines
noted) to avoid command injection and accidental secret leakage.
- Line 45: The workflow currently hardcodes the Docker Elasticsearch target URL
("https://es-ror:9200/_cluster/health?pretty") which isn't configurable; add a
new optional input (e.g. es_docker_host with default "es-ror") to action.yml and
replace the hardcoded host in the probe step with the constructed URL using that
input (e.g. https://${{ inputs.es_docker_host }}:9200/_cluster/health?pretty) so
the Docker probe becomes configurable and consistent with the existing es_url
input; ensure input name matches casing used elsewhere and update any references
to es_ror/es-ror to the new input name.

---

Nitpick comments:
In @.github/workflows/trigger-e2e-tests.yml:
- Around line 64-68: The ES connectivity monitor's log
(/tmp/es-connectivity-monitor.log) is only printed to stdout in the "Stop ES
connectivity monitor" step and isn't included in the artifacts; update the
upload step (the "S3 Upload Videos & show logs" step or the reusable action
.github/upload-videos) to explicitly collect and upload
/tmp/es-connectivity-monitor.log (or add a dedicated step after the "Stop ES
connectivity monitor" step to upload that file to S3/artifacts), ensuring the
file path is added to the list of uploaded logs so ES connectivity traces are
available alongside videos and Docker logs.
- Around line 39-42: The ES connectivity monitor step ("Start ES connectivity
monitor" using ./.github/es-connectivity-monitor) currently relies on the
action's defaults; update that step to explicitly pass the es_url, es_user, and
es_password inputs (e.g., set with: es_url: "http://localhost:9200", es_user:
"kibana", es_password: "kibana" or the appropriate test creds) so configuration
is explicit and not dependent on action defaults; ensure the input names match
the action's input parameters and keep the existing with: action: start entry.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 015c56f8-d984-43b7-9067-2264c7bbafc4

📥 Commits

Reviewing files that changed from the base of the PR and between 4421f33 and 5b908d0.

📒 Files selected for processing (2)

.github/es-connectivity-monitor/action.yml
.github/workflows/trigger-e2e-tests.yml

coderabbitai · 2026-05-07T08:15:32Z

+            curl -sk -u "${{ inputs.es_user }}:${{ inputs.es_password }}" \
+              "${{ inputs.es_url }}/_cluster/health?pretty" \
+              -o /tmp/es-health.json -w "HTTP %{http_code} in %{time_total}s\n" 2>&1 || true


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid direct ${{ inputs.* }} interpolation into shell run: blocks — use env: vars instead.

${{ }} expressions in composite-action run: blocks are handled via variable interpolation before shell execution, so embedding them directly is vulnerable to command injection if a caller ever passes a value containing shell metacharacters.

Additionally, the raw input values are written into the runner's temporary shell script on disk before execution, meaning es_password can appear in runner debug traces.

The safe pattern — per GitHub's own composite-action documentation — is to bind inputs to env: variables on each step and reference them as $ENV_VAR inside the shell script.

🔒 Proposed fix — bind credentials via env:

- name: Start monitor if: inputs.action == 'start' shell: bash + env: + ES_USER: ${{ inputs.es_user }} + ES_PASSWORD: ${{ inputs.es_password }} + ES_URL: ${{ inputs.es_url }} run: | ( while true; do echo "===== $(date -Is) =====" echo "== ES connectivity from host ==" - curl -sk -u "${{ inputs.es_user }}:${{ inputs.es_password }}" \ - "${{ inputs.es_url }}/_cluster/health?pretty" \ + curl -sk -u "${ES_USER}:${ES_PASSWORD}" \ + "${ES_URL}/_cluster/health?pretty" \ -o /tmp/es-health.json -w "HTTP %{http_code} in %{time_total}s\n" 2>&1 || true cat /tmp/es-health.json 2>/dev/null || true echo echo "== ES connectivity from KBN Docker container ==" KBN_CONTAINER=$(docker ps --filter "name=kbn" --format "{{.Names}}" 2>/dev/null | head -1) if [ -n "$KBN_CONTAINER" ]; then echo "Container: $KBN_CONTAINER" docker exec "$KBN_CONTAINER" \ - curl -sk -u "${{ inputs.es_user }}:${{ inputs.es_password }}" \ + curl -sk -u "${ES_USER}:${ES_PASSWORD}" \ https://es-ror:9200/_cluster/health?pretty \ -w "HTTP %{http_code} in %{time_total}s\n" 2>&1 || true ... if [ -n "$ES_SVC" ]; then kubectl exec -n "$KBN_NS" "$KBN_NAME" -- \ - curl -sk -u "${{ inputs.es_user }}:${{ inputs.es_password }}" \ + curl -sk -u "${ES_USER}:${ES_PASSWORD}" \ "https://${ES_SVC}:9200/_cluster/health?pretty" \

Also applies to: 44-46, 64-66

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/es-connectivity-monitor/action.yml around lines 33 - 35, The curl run step currently interpolates inputs with `${{ inputs.es_user }}`, `${{ inputs.es_password }}` and `${{ inputs.es_url }}` directly into the shell command (the `run:` block) which is unsafe; update the step to bind each input to environment variables (e.g., ES_USER, ES_PASSWORD, ES_URL) via the step's `env:` and then reference them inside the `run:` script as `$ES_USER`, `$ES_PASSWORD`, and `$ES_URL`; apply the same change to the other similar curl steps (those using the same `${{ inputs.* }}` patterns on lines noted) to avoid command injection and accidental secret leakage.

coderabbitai · 2026-05-07T08:15:32Z

+              echo "Container: $KBN_CONTAINER"
+              docker exec "$KBN_CONTAINER" \
+                curl -sk -u "${{ inputs.es_user }}:${{ inputs.es_password }}" \
+                https://es-ror:9200/_cluster/health?pretty \


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded Docker ES hostname es-ror:9200 is not configurable.

The Docker probe target (https://es-ror:9200) is baked in while the host probe target is already an input (es_url). If the internal Docker service name ever changes, this probe silently becomes a no-op (swallowed by || true). Consider exposing it as an optional input (e.g. es_docker_host, defaulting to es-ror) for consistency and future-proofing.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/es-connectivity-monitor/action.yml at line 45, The workflow currently hardcodes the Docker Elasticsearch target URL ("https://es-ror:9200/_cluster/health?pretty") which isn't configurable; add a new optional input (e.g. es_docker_host with default "es-ror") to action.yml and replace the hardcoded host in the probe step with the constructed URL using that input (e.g. https://${{ inputs.es_docker_host }}:9200/_cluster/health?pretty) so the Docker probe becomes configurable and consistent with the existing es_url input; ensure input name matches casing used elsewhere and update any references to es_ror/es-ror to the new input name.

coutoPL · 2026-05-18T09:03:45Z

Nothing to merge here. The fix was tested and works well

waiting for index management section loading

851f4eb

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread e2e-tests/cypress/support/page-objects/IndexManagement.ts Outdated

Dzuming and others added 4 commits May 1, 2026 06:07

another fix

9d20aac

remove only

4421f33

revert it.only

1b1a6da

connectivity monitor

5b908d0

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

coutoPL added 7 commits May 8, 2026 09:11

connectivity monitor

da7afda

connectivity monitor

78c0ddb

Merge remote-tracking branch 'origin/master' into fix-tenancy-switch

bdd8a2b

debug level

33ef1bf

no error verobsity level for kbn users

12b52d5

test ROR KBN 1.70.0-pre7

f541564

test ROR KBN 1.70.0-pre7

7b4b1b5

coutoPL changed the title ~~waiting for index management section loading~~ [RORDEV-1944] investigating socket hangs up issue May 18, 2026

coutoPL mentioned this pull request May 18, 2026

waiting for index management section loading #91

Merged

coutoPL closed this May 18, 2026

coutoPL deleted the fix-tenancy-switch branch May 18, 2026 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RORDEV-1944] investigating socket hangs up issue#82

[RORDEV-1944] investigating socket hangs up issue#82
Dzuming wants to merge 12 commits into
masterfrom
fix-tenancy-switch

Dzuming commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coutoPL commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dzuming commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coutoPL commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dzuming commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading